Selection of Relevant and Non-Redundant Feature Subspaces for Co-training

نویسندگان

  • Yusuf Yaslan
  • Zehra Cataltepe
چکیده

On high dimensional data sets choosing subspaces randomly, as in RASCO (Random Subspace Method for Co-training, Wang et al. 2008) algorithm, may produce diverse but inaccurate classifiers for Co-training. In order to remedy this problem, we introduce two algorithms for selecting relevant and non-redundant feature subspaces for Co-training. First algorithm relevant random subspaces (Rel-RASCO) produces subspaces by means of drawing features proportional to their relevances measured by the mutual information between features and class labels. We also modify a successful feature selection algorithm, Minimum Redundancy Maximum Relevance (MRMR), to be used for feature subset selection and introduced Prob-MRMR feature subset selection scheme. Experiments on 5 datasets show that proposed algorithms outperform both RASCO and Co-training in terms of the accuracy achieved at the end of Co-training. Theoretical analysis of the proposed algorithms is also provided.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)

Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...

متن کامل

Audio Genre Classification with Semi-Supervised Feature Ensemble Learning

Widespread availability and use of music have made automated audio genre classification an important field of research. Thanks to feature extraction systems, not only music data, but also features for them have become readily available. However, handlabeling of a large amount of music data is time consuming. In this study, we introduce a semi-supervised random feature ensemble method for audio ...

متن کامل

A hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts

High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...

متن کامل

A New Iterative Neural Based Method to Spot Price Forecasting

Electricity price predictions have become a major discussion on competitive market under deregulated power system. But, the exclusive characteristics of electricity price such as non-linearity, non-stationary and time-varying volatility structure present several challenges for this task. In this paper, a new forecast strategy based on the iterative neural network is proposed for Day-ahead price...

متن کامل

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009